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Abstract 

We consider a linear regression model with regression parameter P = {Pi, ... , Pp) 
^-Ci \ and independent and identically A^(0, a^) distributed errors. Suppose that the pa- 

rameter of interest is ^ = a^ P where a is a specified vector. Define the parameter 
H-l ■ r = c^ P — t where the vector c and the number t are specified and a and c are 

linearly independent. Also suppose that we have uncertain prior information that 
r = 0. We present a new frequentist 1 — a confidence interval for 9 that utilizes this 
prior information. We require this confidence interval to (a) have endpoints that are 

cn ■ 

K*" ' continuous functions of the data and (b) coincide with the standard 1 — a confidence 

cn . interval when the data strongly contradicts this prior information. This interval is 

^ \ optimal in the sense that it has minimum weighted average expected length where 

the largest weight is given to this expected length when r = 0. This minimization 
O ■ leads to an interval that has the following desirable properties. This interval has 
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expected length that (a) is relatively small when the prior information about r is 
correct and (b) has a maximum value that is not too large. The following problem 
will be used to illustrate the application of this new confidence interval. Consider a 
2x2 factorial experiment with 20 replicates. Suppose that the parameter of inter- 
est 6' is a specified simple effect and that we have uncertain prior information that 
the two-factor interaction is zero. Our aim is to find a frequentist 0.95 confidence 
interval for 9 that utilizes this prior information. 
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1. Introduction 

Consider the linear regression model Y = Xf3 + e, where y is a random n- 
vector of responses, X is a known n x p matrix with linearly independent columns, 
f3 = (/9i, . . . ,/9p) is an unknown parameter vector and e ~ A^(0, o-^/„) where a^ is 
an unknown positive parameter. Suppose that the parameter of interest is ^ = a^P 
where a is specified p- vector (a 7^ 0). Define the parameter r = 0^(3 — t where the 
vector c and the number t are specified and a and c are linearly independent. Also 
suppose that previous experience with similar data sets and/or expert opinion and 
scientific background suggest that r = 0. In other words, suppose that we have 
uncertain prior information that r = 0. Of course, this includes the particular case 
that c = (0, . . . , 0, 1) and t = 0, so that the uncertain prior information is that 
f3p = 0. Our aim is to find a frequentist 1 — a confidence interval (i.e. a confidence 
interval whose coverage probability has infimum 1 — a) for 9 that utilizes this prior 
information, based on an observation of Y. 

An attempt to incorporate the uncertain prior information that r = into the 
construction of a 1 — « confidence interval for 6 is as follows. We carry out a pre- 
liminary test of the null hypothesis that r = against the alternative hypothesis 
that r 7^ 0. If this null hypothesis is accepted then the confidence interval is con- 
structed assuming that it was known a priori that r = 0; otherwise the standard 
1 — a confidence interval for 6 is used. We call this the naive 1 — a confidence 
interval for 6. This confidence interval is based on a false assumption and so we 
expect that its minimum coverage probability will not necessarily be 1 — a. This 
minimum coverage probability has been investigated by Giri and Kabaila (2008), 
Kabaila (1998, 2005a), Kabaila and Giri (2009a) and Kabaila and Leeb (2006). In 
many cases this minimum is far below 1 — a, showing that this confidence interval 
is completely inadequate. So, the naive 1 — a confidence interval fails to utilize the 
prior information that r = 0. 

Whilst the naive 1 — a confidence interval for 6 fails abysmally to utilize the 
prior information that r = 0, its form (as described in Section 2) will be used to 
provide some motivation for the new confidence interval described in Section 3. Sim- 
ilarly to Hodges and Lehmann (1952), Bickel (1983, 1984), Kabaila (1998), Kabaila 



(2005b), Farchione and Kabaila (2008), Kabaila and Tuck (2008) and Kabaila and 
Giri (2009b), our aim is to utilize the uncertain prior information in the frequen- 
tist inference of interest, whilst providing a safeguard in case this prior information 
happens to be incorrect. We assess a 1 — a confidence interval for 9 using the ra- 
tio (expected length of this confidence interval) /(expected length of standard 1 — a 
confidence interval). We call this ratio the scaled expected length of this confidence 
interval. In Section 3 we describe a new 1 — a confidence interval for 9 that utilizes 
the prior information. This interval has endpoints that are continuous functions of 
the data and it has the following properties. It coincides with the standard 1 — a 
confidence interval when the data strongly contradicts the prior information. This 
interval is optimal in the sense that it has minimum weighted average expected 
length where the largest weight is given to this expected length when r = 0. This 
minimization leads to an interval that has the following desirable properties. This 
interval has scaled expected length that (a) is smaller than 1 when the prior infor- 
mation about r is correct and (b) has a maximum value that is not too much larger 
than 1. The idea of minimizing a weighted average expected length of a confidence 
interval, subject to a coverage probability inequality constraint, appears to have 
been first used by Pratt (1961). 

In Section 4 we consider the following scenario. Suppose that a 2 x 2 factorial 
experiment, with factors labeled A and B and with more than 1 replicate, has been 
conducted. Also suppose that our interest is solely in the simple effect of changing 
factor A from low to high when factor B is low. Consider, for example, the case that 
factor A (B) being low or high corresponds to the absence or presence of treatment A 
(B), respectively. Our interest may be solely in the effect of treatment A compared 
to no treatment (cf. Hung et al (1995)). In other words, the parameter of interest 9 
is the simple effect (expected response when factor A is high and factor B is low) — 
(expected response when factor A is low and factor B is low). In this case, p = A and 
we identify r with the two-factor interaction. Suppose that previous experience with 
similar data sets and/or expert opinion and scientific background suggest that the 
two-factor interaction is zero. In a 2 x 2 factorial clinical trial comparing two drugs 
whose presumed effects are on completely different systems and/or diseases, it seems 



reasonable to suppose that we have uncertain prior information that the two-factor 
interaction is zero (Stampfer et al (1985), Steering Committee of the Physicians' 
Health Study Research Group (1988)), During and Hennekens (1990) and Hung 
et al (1995)). For an example of the elicitation of uncertain prior information in 
a factorial experiment via expert opinion and scientific background in a chemical 
context see Dube et al (1996). 

An attempt to utilize the uncertain prior information that the two-factor inter- 
action is zero is to use a naive 1 — a confidence interval for 6 constructed using the 
following preliminary test. The preliminary test is of the null hypothesis that the 
two-factor interaction is zero against the alternative hypothesis that the two-factor 
interaction is non-zero. This confidence interval has a minimum coverage probability 
that is far below 1 — a, showing that it is completely inadequate. As an illustration, 
consider the case that the number of replicates is 20, 1 — a = 0.95 and the prelimi- 
nary hypothesis test has level of significance 0.05. We find, using the methodology 
of Kabaila (1998, 2005a) or Giri and Kabaila (2008) or Kabaila and Giri (2009a), 
that the minimum coverage probability of this confidence interval is 0.7306. The 
poor coverage properties of the naive confidence interval are presaged by the poor 
properties of some other inferences carried out after this preliminary test, see Fabian 
(1991), Shaffer (1991) and Ng (1994) (cf. Neyman (1935), Bohrer and Sheft (1979) 
and Traxler (1976)). 

The properties of the new confidence interval, described in Section 3, are illus- 
trated in Section 4 by a detailed analysis of the 2x2 factorial experiment example 



with 20 replicates and 1 — a = 0.95. Define the parameter 7 = r/A/var('r), where 
f denotes the least squares estimator of r. As proved in Section 3, the coverage 
probability of the new confidence interval for 6 is an even function of 7. The top 
panel of Figure 3 is a plot of the coverage probability of the new 0.95 confidence in- 
terval for ^ as a function of 7. This plot shows that the new 0.95 confidence interval 
for 6 has coverage probability 0.95 throughout the parameter space. As proved in 
Section 3, the scaled expected length of the new confidence interval for 9 is an even 
function of 7. The bottom panel of Figure 3 is a plot of the square of the scaled 
expected length of the new 0.95 confidence interval for 6' as a function of 7. When 



the prior information is correct (i.e. 7 = 0), we gain since the square of the scaled 
expected length is substantially smaller than 1. The maximum value of the square 
of the scaled expected length is not too large. The new 0.95 confidence interval 
for 9 coincides with the standard 1 — a confidence interval when the data strongly 
contradicts the prior information. This is reflected in Figure 3 by the fact that the 
square of the scaled expected length approaches 1 as 7 ^ 00. 

2. The naive confidence interval 

The naive 1 — a confidence interval for 9 is constructed as follows. We carry 
out a preliminary test of the null hypothesis that r = against the alternative 
hypothesis that r 7^ 0. If this null hypothesis is accepted then the confidence 
interval is constructed assuming that it was known a priori that r = 0; otherwise 
the standard 1—a confidence interval for 9 is used. As noted in the introduction, this 
confidence interval will often have minimum coverage probability far below 1 — a, 
showing that it is completely inadequate. In this section we describe the naive 
confidence interval in a new form that will be used to provide some motivation for 
the new confidence interval described in the next section. 

Let P denote the least squares estimator of p. Let O denote a'^/3 i.e. the least 
squares estimator of 9. Also, let f denote c^/3 — t i.e. the least squares estimator 
of r. Define the matrix V to be the covariance matrix of (0, f) divided by o"^. Let 
Vij denote the {i,j) th element of V. The standard 1 — a confidence interval for 9 is 
I = [Q — tn~p^i-^^/viia, 6 + tn-p,i--2i^A^o"] ) where the quantile tm,a is defined 
by P{T < tm,a) = a for T ~ t„ and a'^ = {Y - XPY{Y - X/3)/(n - p). 

The naive 1 — a confidence interval for 9 is obtained as follows. The usual test 
statistic for testing the null hypothesis that r = against the alternative hypothesis 
that r 7^ is TJio^Jv^. Suppose that, for some given positive number g, we fix r 
at if \t\I[o^Jvti) ^ Q'; otherwise we allow r to vary freely. We use the notation 
[a ± 6] for the interval [a — 6, a + 6] (6 > 0). Also define p = v\2l ^V\\V22- Note 
that p is the correlation between B and f and so it satisfies —1 < p < 1. The naive 
1 — a confidence interval is as follows (Kabaila and Giri (2009a)). If \T\l{a^Jv^ > q 
then this confidence interval is [0 — tn-p,i-^y^Viid-, 6 + tn-p,i-^y^ViiO']- If, on 



the other hand, \T\/{a^V22) < <i then this confidence interval is 
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This confidence interval can be expressed in the new form 
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s{x) 



for |a;| > q 
for |a;| < q. 
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for X > q 



t„_,+i,i_.v/r^7J^i^ for 0<x<g. 



In Section 4 we will consider the example of a 2 x 2 factorial experiment with 
20 replicates. Here p = 4. The parameter of interest 6 is the simple effect (expected 
response when factor A is high and factor B is low) — (expected response when factor 
A is low and factor B is low). We identify r with the two- factor interaction, so that 
p = — l/v2 = —0.7071068. Suppose that we have uncertain prior information that 
the two-factor interaction is zero. Also suppose that we carry out a preliminary test 
of the null hypothesis that the two-factor interaction is zero against the alternative 
hypothesis that this interaction is non-zero. Let the level of significance of this test 
be 0.05, so that q = 1.991673. Figure 1 is a plot of the functions b and s for the 
resulting naive 0.95 confidence interval for 6. This confidence interval is completely 
inadequate, as its minimum coverage probability is 0.7306. It also has the unpleasant 
feature that its endpoints are discontinuous functions of the data. 
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Figure 1: Plots of the functions b and s for the naive 0.95 confidence interval for 
the simple effect 6 in the context of the 2x2 factorial experiment with 20 replicates. 
This confidence interval is based on a preliminary test of the null hypothesis that the 
two-factor interaction is zero against the alternative hypothesis that this interaction 
is non-zero, with level of significance 0.05. 



3. New confidence interval utilizing prior information 

In this section we describe a broad class of confidence intervals for 9. These 
confidence intervals are required to have endpoints that are smooth function of the 
data. They are also required to coincide with the standard 1 — a confidence intervals 
when the data strongly contradict the prior information. We provide computation- 
ally convenient expressions for the coverage probability and the scaled expected 
length for confidence intervals from this class. These computationally convenient 
expressions were first described by Kabaila and Giri (2007a,b). We then describe 
a weight function for the difference ((scaled expected length of the confidence in- 
terval) — (scaled expected length of the standard 1 — a confidence interval)). This 
weight function gives the largest weight to this difference when r = i.e. when the 
prior information is correct. We find an interval that is optimal in the sense that it 
minimizes the weighted average of this difference subject to the constraint that it 
has minimum coverage probability 1 — a. Our choice of the weight function ensures 
that this interval utilizes the prior information. 

We introduce a confidence interval for 9 that is similar in form to the naive 1 — a 
confidence interval, described in the previous section, but with a great "loosening 
up" of the forms that the functions h and s can take. Define the following confidence 
interval for 9 



6 — a/^ii (x h I - — —= \ ± Jl)iid' s 



where the functions h and s are required to satisfy the following restriction. 
Restriction 1 

6 : ]R ^ R is constrained to be an odd function and s : [0, oo) -^ [0, oo). 
The motivation for restricting attention to this form of interval is provided by the 
new invariance arguments presented in Appendix A. We also require that the func- 
tions b and s satisfy the following restriction. 

Restriction 2 

b and s are continuous functions. 

This implies that the endpoints of the confidence interval J{b, s) are continuous 

functions of the data. Finally, we require the confidence interval J{b, s) to coincide 
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with the standard 1 — a confidence interval / when the data strongly contradict 

the prior information. The statistic \t\ / {a ^yv22) provides some indication of how far 

away t j(o^Jv^ is from 0. We therefore require that the functions h and s satisfy 

the following restriction. 

Restriction 3 

hix) = for all |a;| > d and s{x) = tn-p,i-s. for sXl x > d where (i is a (sufficiently 

large) specified positive number. 

Define 7 = rjio^Jv^^ G = (0 — Q^jio^Jvix) and H = t jio^Jv^. Note that 

"0 



G 
H 



N 
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1 P 
P 1 
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where, as defined in Section 2, p = vv^l \Jv\\V22- Also define W = a/a. Note that 
{G, H) and W are independent random vectors. Also, W has the same distribution 
as \/QI{n — p) where Q ~ Xn-p- Let fw denote the probability density function of 
W. 

It is straightforward to show that the coverage probability P{6 E J{h,s)) is 
equal to P{^{H, W) < G < u{H, W)), where the functions £(-, ■) : R x [0, 00) ^ M 
and u{-,-) : M x [0, 00) -^ IR are defined by i{h,w) = b{h/w)w — s{h/w)w and 
u{h,w) = b{h/w)w + s{h/w)w. For given b, s and p, the coverage probability of 
J{b, s) is a function of 7. We denote this coverage probability by 0(7; b, s, p). 

Part of our evaluation of the confidence interval J{b, s) consists of comparing it 
with the standard 1 — a confidence interval / using the criterion 

expected length of J{b, s) 



expected length of / 
We call this the scaled expected length of J(6, s). This is equal to 



(3) 



This is a function of 7 for given s. We denote this function by 6(7; s). Clearly, for 
given s, 6(7; s) is an even function of 7. 

Our aim is to find functions b and s that satisfy Restrictions 1-3 and such that 
(a) the minimum of 0(7; b, s, p) over 7 is 1 — a and (b) 

(e(7;s)-l)M7) (4) 
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is ininimized, where the weight function v has been chosen to be 

v[x) = Ax + U[x) for all x G M, (5) 

where A is a specified nonnegative number and ?i is the unit step function defined 
by l-Lix) = for x < and 7i(x) = 1 for a; > 0. The larger the value of A, the 
smaller the relative weight given to minimizing e(7; s) for 7 = 0, as opposed to 
minimizing 6(7; s) for other values of 7. Similarly to Farchione and Kabaila (2008), 
who consider a much simpler model, we expect the weight function ([5]) to lead to a 
1 — a confidence interval for Q that has expected length that (a) is relatively small 
when r = and (b) has maximum value that is not too large. 

The following theorem provides new computationally convenient expressions for 
the coverage probability and scaled expected length of J(6, s). 

Theorem 1. 

(a) Define the functions /c^(/i, w^'^^p) = \1' ( — tn-p,i-^w, tn-p,i~^w; p{h — 7), 1 — p^) 
and k{h,w,'y, p) = ^ {(i{h, w) , u{h^ w); p{h — 7), 1 — p^), where \E'(x,|/; p, f ) = P{x < 
Z < y) foT Z r^ N{p, v). The coverage probability of J{b, s) is denoted by 0(7; 6, s, p) 
and is equal to 

/■oo I'd 

(1 — a) + / / (^k{wx,w,'y, p) — k^{wx,w,'y, p)) (j){wx — 'y) dxw fwiw) dw (6) 

Jo J-d 

where denotes the A^(0, 1) probability density function. For given b, s and p, 
0(7; b, s, p) is an even function of 7. 

(b) The scaled expected length of J(6, s) is 

1 /"OO pd 

e(7;s) = l + ^ ^ ^,^. j j {s{\x\)-tn-p,i-ii)(t){wx--i)dxw'^ fw{w)dw. 



Substituting (ITj) into (HI), we obtain that (jl]) is equal to 

-| /"OO /"CO I'd 

(s(|x|) — t„_p^i_a) (j){wx — 7) dxw^ fwiuj) dw dv{^) 



tn-p,l-^ E{W) J_^ Jq J_d 



1 ^00 fd 



tn-p,l-^E{W) JQ J_d - J_^ 



2 . 
r>oo />a! 



(s(|2;|) -t„_p,i_|) / (f){wx--f)du{-^)dxw'^ fw{w)dw 



EvTin / / (^(a;) -t„_p,i_^) (A + 0(«;x))rfxw;Vw^(w^)c?w 

tn-p,l-f -C/ 1,1^ J Jo Jo 
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For computational feasibility, we specify the following parametric forms for the 
functions b and s. We require 6 to be a continuous function and so it is necessary that 
6(0) = 0. Suppose that Xi, . . . ,Xg satisfy = Xi < X2 < ■ ■ ■ < Xg = d. Obviously, 
b{xi) = 0, b{xq) = and s{xq) = tn-p,i-^- The function b is fully specified by the 
vector (6(x2), . . . , b{xq^i)) as follows. Because b is assumed to be an odd function, 
we know that 6(— Xj) = —b{xi) ior i = 2, . . . ,q. We specify the value of b{x) for any 
X G [— d, d] by cubic spline interpolation for these given function values, subject to 
the constraint that b'{—d) = and b'{d) = 0. We fully specify the function s by the 
vector (s(a;i), . . . , s{xg^i)) as follows. The value of s{x) for any x G [0, d] is specified 
by cubic spline interpolation for these given function values (without any endpoint 
conditions on the first derivative of s). We call xi, 0:2, . . . x^ the knots. 

To conclude, the new 1 — a confidence interval for 9 that utilizes the prior 
information that r = is obtained as follows. For a judiciously-chosen set of values 
of d, X and knots Xi, we carry out the following computational procedure. 

Computational Procedure 

Compute the functions b and s, satisfying Restrictions 1-3 and taking the parametric 
forms described above, such that (a) the minimum over 7>0of([6])isl — a and 
(b) the criterion ([8]) is minimized. Plot 6^(7; s), the square of the scaled expected 
length, as a function of 7 > 0. 

Based on these plots and the strength of our prior information that r = 0, we choose 
appropriate values of d, A and knots x,. The confidence interval corresponding to 
this choice is the new 1 — a confidence interval for 6. 

Remark 3.1 Suppose that A > is fixed. Also suppose that we apply the Compu- 
tational Procedure without any parametric restrictions of the form described above. 
The structure of the criterion (jlj) when z/ is given by ([5]) make it highly plausible 
that the resulting 1 — a confidence interval for 6 will have a scaled expected length 
e(7; s) that converges uniformly in 7 to some limiting function as c/ — > 00. It is also 
highly plausible that this limiting function can be found to a very good approxima- 
tion by applying this Computational Procedure for d sufficiently large and knots Xi 
sufficiently closely spaced. 
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4. Application to the analysis of data from a 2 x 2 factorial 
experiment 

In this section we consider a 2 x 2 factorial experiment with 20 rephcates and 
parameter of interest 9 the simple effect (expected response when factor A is high 
and factor B is low) — (expected response when factor A is low and factor B is 
low). We suppose that we have uncertain prior information that the two-factor 
interaction is zero. We use this example to illustrate the properties of the new 1 — a 
confidence interval for 9 that utilizes this prior information, when 1 — a = 0.95. All 
of the computations presented in this paper were performed with programs written 
in MATLAB, using the Optimization and Statistics toolboxes. 

Let Xi take the values —1 and 1 when the factor A takes the values low and 
high respectively. Also let X2 take the values —1 and 1 when the factor B takes the 
values low and high respectively. In other words, xi and X2 are the coded values of 
the factors A and B respectively. The model for this experiment is 

Y = l3o + (3iXi + 132X2 + /9i2a:ia;2 + e (9) 

where Y is the response, /Sq, A, 132 and I3i2 are unknown parameters and the e 
for different response measurements are independent and identically A^(0, o"^) dis- 
tributed. Thus 9 = 2{l3i — Pu). Let /3i and Pu denote the least squares estimators 
of Pi and Pi2 respectively. The least squares estimator of 9 is Q = 2{Pi — Pi2). Our 
uncertain prior information is that P12 = 0. Note that 
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Hence p = — l/v2. 

We followed the Computational Procedure, described at the end of the previous 
section, with d = 6, X = 0.2 and evenly-spaced knots Xi at 0,1,2,..., 6. The 
resulting functions b and s, which specify the new 0.95 confidence interval for 9, 
are plotted in Figure 2. The performance of this confidence interval is shown in 
Figure 3. This confidence interval has coverage probability 0.95 throughout the 
parameter space. When the prior information is correct (i.e. 7 = 0), we gain since 
e^(0; s) = 0.8683. The maximum value of 6^(7; s) is 1.1070. This confidence interval 
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coincides with the standard 1 — a confidence interval for 6 when the data strongly 
contradicts the prior information, so that 6^(7; s) approaches 1 as 7 ^ 00. It is 
interesting to note the broad qualitative similarities between the functions plotted 
in Figures 1 and 2. 

These values oi d = 6, X = 0.2 and knots Xi were obtained after a search that we 
summarize as follows. Consider d = 6, evenly-spaced knots Xi at 0, 1, 2, . . . , 6 and 
A = 0.05, 0.2 , 0.5 and 1. The Computational Procedure was applied for each of these 
values. As expected from the form of the weight function, for each of these values of 
A, 6^(7; s) is minimized at 7 = 0. For a given value of A, define the 'expected gain' 
to be (1 — e^(0;s)) and the 'maximum potential loss' to be (max^e^(7;s) — l). 
As shown in Table 1, as A increases (a) the expected gain decreases and (b) the 
ratio (expected gain) /(maximum potential loss) increases. By choosing A = 0.2 we 
have both a reasonably large expected gain and a reasonably large value of the ratio 
(expected gain) /(maximum potential loss). 



A 


0.05 


0.2 


0.5 


1 


expected gain 


0.196 


0.1317 


0.0822 


0.043 


maximum potential loss 


0.2610 


0.1070 


0.0503 


0.0248 


(expected gain) /(maximum potential loss) 


0.7509 


1.2308 


1.6341 


1.7338 



Table 1: Performance of the new 0.95 confidence interval for d = 6 and knots Xi at 
0, 1, . . . , 6 when we vary over A G {0.05, 0.2, 0.5, 1}. 

Now consider A = 0.2 and evenly-spaced knots Xi aX 0,1,2, ... ,d where d = 4, 6, 
8 and 10. The Computational Procedure was applied for each of these values. There 
was a marked improvement in performance of the resulting 0.95 confidence interval 
when d was increased from 4 to 6. However, the improvement in performance of 
the resulting 0.95 confidence was negligible when d was increased from 6 to 8 and 
from 6 to 10. This suggests that increasing d beyond 6 will lead to a negligible 
improvement in performance of the confidence interval. 

Finally, consider d = 6, X = 0.2 and two sets of evenly-spaced knots Xi at 
0, 0.6, 1.2, 1.8, . . . , 6 and 0, 0.5, 1, 1.5, . . . , 6. The Computational Procedure was ap- 
plied to both of these sets of knots. The improvements in performance of the result- 
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ing 0.95 confidence interval (compared to the performance for c? = 6, A = 0.2 and 
evenly-spaced knots Xi at 0,1,2,..., 6) were practically negligible. This suggests 
that there will be a practically negligible improvement in performance if the spacing 
between the evenly-spaced knots is reduced to less than 1. 
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Figure 2: Plots of the functions b and s for the new 1 — a confidence interval in the 
context of a 2 X 2 factorial experiment with 20 replicates, parameter of interest the 
simple effect 6 = 2{j3i — Pu) and l — a = 0.95. These functions were obtained using 
d = 6, \ = 0.2 and the knots Xi at 0, 1, 2, ... , 6. 
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Figure 3: Plots of the coverage probability and 6^(7; s), the squared scaled expected 
length, (as functions of 7 = /9i2/\/ var(/3i2) ) of the new 0.95 confidence interval for 

the simple effect 6 = 2(/5i — /9i2) for the 2x2 factorial experiment with 20 replicates. 
These functions were obtained using d = 6, X = 0.2 and the knots Xi at 0, 1, 2, ... , 6. 
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5. Discussion 

Discussion 5.1 Our motivation for the weight function ([5]) is as follows. Suppose 
that the only restriction on the functions 6 : M ^ M and s : [0, oo) -^ [0, oo) is 
that b is an odd function. Consider the weight function u = 7i, which corresponds 
to all of the weight being placed at r = 0. The minimization of (jll), subject to 
P[9 E J{b,s)^ > 1 — a for all 7, leads to a 1 — a confidence interval for 9 with 
the following properties. This interval has the smallest expected length when r = 
(i.e. when the prior information is correct) of any \ — a confidence interval for 
Q. However, this confidence interval has the weakness that its expected length 
approaches infinity as I7I -^ 00 (Tuck, 2006). Now consider the weight function 
V = x^ which corresponds to a uniform weight over M. The minimization of (jlj), 
subject to P(6' G J(&, s)) > 1 — a for all 7, leads to the standard 1 — a confidence 
interval /. Finally, consider the weight function ([5]), which is a mixture of the weight 
functions 7i and x, for fixed A > 0. This weight function puts a large amount of 
weight at r = 0, consistent with our desire that the confidence interval has relatively 
small expected length when the prior information is correct. Also, the x component 
of this weight function leads to a confidence interval whose expected length has a 
maximum value that is finite. In addition, the structure of the criterion (jlj) when u 
is given by (j5j) makes it highly plausible that the 1 — a confidence interval resulting 
from the minimization of (jlj), subject to P(^ G J{b, s)) > 1 — a for all 7, will have 
the desirable feature that it approaches the standard 1 — a confidence interval / as 
the data increasingly contradict the prior information. Fortuitously, this property 
leads to the computational advantage described in Remark 3.1. 

Discussion 5.2 The new 1 — a confidence interval is computed to satisfy the con- 
straint that its minimum coverage probability is 1 — a. For the example described 
in Section 4, it is remarkable that the new 1 — a confidence interval has coverage 
probability equal to 1 — a throughout the parameter space. The new 1 — a confidence 
interval has been computed for a wide range of values oi 1 — a, X, p, n — p (including 
the limiting case n — p —* 00), d and knots Xj. In each case, the new 1 — a confidence 
interval has coverage probability equal to 1 — a throughout the parameter space. 
This provides strong empirical evidence that the new 1 — a confidence interval has 
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the attractive property that its coverage probabihty is equal to 1 — a throughout 
the parameter space. 

Discussion 5.3 The new 1 — a confidence interval has been computed for a wide 
range of values oi 1 — a, X, p, n — p (including the limiting case n — p ^ oo), d and 
knots Xj. For each of these values oi 1 — a, X, d and knots Xi, e^(0; s) (which is the 
minimum value of 6^(7; s)) decreases when \p\ increases and/or {n — p) decreases. 

Discussion 5.4 Consider the particular case that p = 0. In this case, we expect 
that any improvement in performance of the new 1 — a confidence interval over the 
standard 1 — a confidence interval / can only be due to improved estimation of the 
parameter a. Computations show that the new 1 — a confidence interval performs 
well (in terms of utilizing the uncertain prior information) for small n — p, when A 
is chosen appropriately. However, the new 1 — a confidence interval approaches the 
standard 1 — a confidence interval I a.s n — p -^ 00. 

Discussion 5.5 We briefiy compare our frequentist approach with a Bayesian ap- 
proach to the problem stated in the paper. A full discussion will be presented in a 
separate paper. For simplicity, suppose that cr^ is known and that 



A^ 



1 P 
P 1 



For the Bayesian approach, suppose that we choose independent prior pdf 's for 9 
and r. Also suppose that for this approach (a) B has an uniform improper prior pdf 
and (b) r has the prior pdf ^6{t) + {1 — ^) where 6 denotes the delta function and ^ 
is a fixed number satisfying < ^ < 1. Contrasting features of the new frequentist 
1 — a confidence interval for 6 described in the present paper and the Bayesian 1 — a 
highest probability density (HPD) regions for B include the following: 

(a) Suppose that the only restriction on the functions 6 : R ^ R and s : [0, 00) -^ 
[0, 00) is that b is an odd function. Consider the weight function u = 7i, which 
corresponds to all of the weight being placed at r = 0. The minimization of (jlj), 
subject to P[9 E J{b,s)) > 1 — a for all 7, leads to a 1 — a confidence interval 
with the smallest expected length when r = of any 1 — a confidence interval for 6. 
There is no Bayesian analogue of this confidence interval. If we choose ^ = 1 then 
the Bayesian 1 — a HPD region for B is equal to the usual 1 — a confidence interval 
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for 9 based on the assumption that r = 0. This confidence interval has coverage 
probabihty with infimuni 0. 

(b) By the appropriate choices of 1 — a, .^, p, a and f, one can find Bayesian 1 — a 
HPD regions for B that consist of the union of two disjoint intervals. By contrast, 
the methodology of the present paper always produces a confidence interval. 

(c) By the appropriate choices of 1 — a, ^ where ^ < 1, p and a, one can find Bayesian 
1 — a HPD regions for B that have frequentist minimum coverage probabilities far 
below 1 — a. 

Discussion 5.6 We briefiy discuss the computation of the new confidence interval. 
A full discussion is provided by Giri (2008) and will be presented in a separate 
paper. Our first step has been to truncate the integrals with respect to w in (l6l), 
([7]) and (IH]) and to find upper bounds on the truncation errors. The computational 
implementation of the constraints that 0(7; b,s, p) > 1 — a for all 7 > is as 
follows. Restriction 3 implies that, for any reasonable choice of the functions b and 
s, c(7;6, s,p) ^ 1 — a as 7 ^ 00. The constraints implemented in the computer 
programs are that 0(7; b,s, p) > 1 — a for each 7 G {0, A, 2A, . . . , MA} where A is 
sufficiently small and M is sufficiently large. 

Discussion 5.7 The new 1 — a confidence interval for 6 is founded on the assumption 
that the random errors Ei are independent and identically A^(0, a^) distributed. This 
confidence interval is based on the least squares estimator B of 6* and the estimator 
a of cr. Consequently, it will display the same kind of lack of robustness to non- 
normality of the random errors as the standard 1 — a confidence interval /. 

Discussion 5.8 We illustrate our method with the following real data set. We extract 
a 2 X 2 factorial data set from the 2'^ factorial data set described in Table 7.5 of 
Box et al (1963) as follows. Define xi = — 1 and xi = 1 for "Time of addition 
of HNO3" equal to 2 hours and 7 hours, respectively. Also define X2 = — 1 and 
X2 = 1 for "heel absent" and "heel present", respectively. The observed responses 
are the following: y = 87.2 for (a;i,a;2) = (—1, —1), y = 88.4 for (xi,a;2) = (1, —1), 
y = 86.7 for (xi,X2) = (—1, 1) and y = 89.2 for (xi,X2) = (1, 1). We use the model 
([9]). The discussion on p. 265 of Box et al (1963) implies that there is uncertain 
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prior information that /?i2 = 0. The discussion on p. 266 of Box et al (1963) imphes 
that there is an estimator o"^ of o"^, obtained from other related experiments, with 
the property that cP' jo"^ ~ Qjvti where Q ~ x^ and m is effectively infinite. The 
observed value of a is 0.8. As in Section 4, define the parameter of interest Q to 
be the simple effect (expected response when x\ = \ and X2 = —1) — (expected 
response when xi = — 1 and X2 = —1), so that 9 = 2{Pi — /3i2). Thus 

2 - 1/2 ■ 
-1/2 1/4 

The standard 0.95 confidence interval for 6 is [—1.01745,3.41745]. We have also 
computed the new 0.95 confidence interval for 6 using d = 6, X = 0.2 and equally- 
spaced knots at 0, 6/8, . . . , 6. This confidence interval is [—0.81967, 3.26345], which 
is substantially shorter than the standard 0.95 confidence interval. 

Discussion 5.9 Denote the the usual 1 — a confidence interval for 6, based on the as- 
sumption that r = 0, by K. The naive 1 — a confidence interval described in Section 
2 may be viewed as being obtained via a monotone discontinuous transition, based 
on the value of the test statistic \f\/{a^/v22), from the standard 1 — a confidence 
interval / to K. What are the properties of the confidence interval that results 
from replacing this monotone discontinuous transition by a monotone continuous 
transition? 

For simplicity, consider the case that n — p is large. Define the quantile Za by 
P{Z < Za) = a for Z ~ iV(0, 1). In this case, / = \Q—zi_^^/viid-, Q + zi^^^/viicP\ 
and i^ = \Q — {'T/{d'^/v22))p^/vi[a±zi-ii^/Wiia\Jl — p^] . The naive 1— a confidence 
interval for 9 described in Section 2 may be expressed in the following form 

9 i-^) I +{^-9 f ^) ) K (10) 

\crJv22j V \(y\r^ 



where g : [0, 00) — > [0, 1] is the step function defined by g{x) =0 for all x G [0, g] 
and g[x) = \ for all x > q. 

Now suppose that, instead, g is a, continuous increasing function satisfying 
(7(0) = and g{x) ^ 1 as x — *> 00. What are the properties of the confi- 
dence interval ( fTO!) in this case? It is straightforward to show that ( ITOl) can be 
expressed in the form ([T]) with h{x) = (1 — g{\x\))px for all a; G M and s{x) = 
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g{x)(^l — a/1 — p^) + a/1 — p^ ) 2;i_2L for all a; > 0. In other words, the confidence 
interval (ITOl) is of the form ([T]), but with very severe constraints on the functions b 
and s. In particular, s(0) = a/1 — p^zi^^ and s(x) is a nondecreasing function that 
converges to Zi_^ as a; ^ oo. The new 1 — a confidence interval described in Section 
3 has been computed for a wide range of values of p > and in every single case 
these very severe constraints are far from satisfied by s. So, the confidence interval 
(1 10 1) does not provide a shortcut to finding the new confidence interval described 
in Section 3. Indeed, the strength of these constraints on the functions h and s 
implies that any confidence interval of the form ( ITOl) will be far inferior to the new 
confidence interval described in Section 3. The results of Joshi (1969) show that the 
confidence interval / is admissible, with the consequence that the minimum coverage 
probability of the confidence interval (TTOj) must be less than 1 — a. 

Appendix A. Invariance arguments 

In this appendix we provide a motivation for considering a confidence interval 
for 9 of the form ([T]) where 6 : M ^ M is constrained to be an odd function and 
s : [0, oo) -^ [0, oo). We provide this motivation through the invariance arguments 
listed below. Traditional invariance arguments (see e.g. Casella and Berger (2002, 
section 6.4) do not include considerations of the available prior information. The 
novelty in the present appendix is that the invariance arguments need to take proper 
account of the prior information. Suppose that we have uncertain prior information 
that r = 0. Remember that the parameter of interest 9 is defined to be a^ (3. 

Our first step is to reduce the data to (0,f, o"). Note that (0,t) and a are 
independent random vectors with 



A^ 



,aV 



and {n — p)a'^ /a'^ ~ Xn-p- Consider a confidence interval 

[£(e,f,a),w(e,f,a)] (A.l) 

for 9 where ^ : M x R x [0, oo) ^ M and m : M x M x [0, oo) ^ R. 
Invariance Argument 1 
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The model for the reduced data may be re-expressed 
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where 6"^ = 6 + c, r"^ 



a, G)t = + c and ft = f . Also, let at = a. Note that 



t^2 _ 2 

An— p" 



(6t,ft) and ((ft)^ are independent random vectors with (n — p)(o"t)^/(cr 
The uncertain prior information may be re-expressed as rt = 0. 

This re-expressed model and prior information have the same form as the original 
model and prior information. Thus the confidence interval [£(0t, f , cr), M(G)t, f , a)] 
for ^t must lead to a confidence interval for 6 that is identical to (jA.ip . This implies 
that i{Q, f , a) = e + £(f , a) and u{Q, f,a) =Q + u{f, a), where £ : M x [0, cx)) ^ M 
and M : M X [0, oo) -^ R. 

Invariance Argument 2 

Let c be a positive number. The model for the reduced data may be re-expressed 



et 



N 



■0t 



[o- 



t)V 



where 6^ = c6, rt 



CT, a' 



;o", Gt = cG and ft = cf. Also, let a^ = ca. Note 



that (Gt,ft) and ((jt)^ are independent random vectors with {n — p)((Tt)2/(crt)2 ^ 
Xn-p- The uncertain prior information may be re-expressed as rt = 0. 

This re-expressed model and prior information have the same form as the original 
model and prior information. Thus the confidence interval [Qt -|- £(ft, at), Ot + 
M(ft,(3"t)] for 6't must lead to a confidence interval for 9 that is identical to [O -|- 
i{f, 6"), + n(f , o")] for 9. This implies that £(6, f , a) = O — h{j/a)a — s{j /d')a and 
u(e, f , a) = - 6(f /a)a + s{f/a)a, where 6 : M ^ M and s : M ^ [0, cx)). 

Invariance Argument 3 

The model for the reduced data may be re-expressed 



et 



A^ 



■0t 



,(-W • 



where ^t = —Q ft 



-r, a' 



cr, 0t = —0 and ft = — f . Also, let at = a. Note that 
(0t,ft) and (at)^ are independent random vectors with {n — p)(at)^/(at)^ ~ x\-p- 
The uncertain prior information may be re-expressed as rt = 0. 



21 



This re-expressed model and prior information have the same form as the original 
model and prior information. Thus the confidence interval 



T 



M 



'^•-^'^j**-*^ *-®*-H^ *'+n^ 



M 



O 



for 9'^ must lead to a confidence interval for 9 that is identical to the confidence 
interval 

^e-^(i).-.(m)..e-s(l)...(^). 

for 9. This implies that h is an odd function and s : [0, oo) — > [0, oo). 

Now define the functions h{x) = (l/y't'ii) ^(v^i^a;) for all x G M and s{x) = 
(l/^tiii) s[^/v22x) for all X > 0. Since b is constrained to be an odd function, b 
is also an odd function. Also, since s : [0, oo) -^ [0,cx3), s : [0, oo) -^ [0, oo). The 
confidence interval (jA.ip is therefore equal to J{b, s) where 6 : M ^ M is an odd 
function and s : [0, oo) — > [0, oo). 

Appendix B. Proof of Theorem 1 

Proof of part (a). 

The random vectors (G, H) and W are independent. It follows from ([2]) that the 
probability density function of H, evaluated at h, is 0(/i — 7). Thus 

foo ^00 /•u{h,w) 

(B.l) 



c{,l]b,s,p)= I I I fG\Hi9\h)dg(j){h~-f)dhfw{w)dw 

Jo J -00 Ji{h,w) 

where fw denotes the probability density function of W and fG\H{g\h) denotes 
the probability density function of G conditional on H = h, evaluated at g. The 
probability distribution of G conditional on if = /i is iV(p(/i — 7), 1 — p^) . Thus the 
right hand side of ( IB. II) is equal to 



k{h, w, 7, p) (j){h — 7) dh fw{u]) dw 



(B.2) 



'0 j-00 
The standard 1 — a confidence interval / has coverage probability 1 — a. Hence 



1 — a = / / k\h,w,'^, p) 4){h — '^)dh fw{w)dw. 
'0 J-00 



(B.3) 
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Subtracting (]B.3I) from (16.20 and noting that b{x) = for all \x\ > d and s{x) = 
tn-p,i~^ for all X > d, we find that 

/■oo pdw 

c{i;b,s,p) = {l-a)+ / {k{h,w,-f,p)-k^h,w,-f,p))(t>{h-j)dhfw{w)dw. 

Jo J-dw 

Changing the variable of integration from h to x = h/w in the inner integral, we 
obtain (l6l). Using the fact that 
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it may be shown that P{6 G J{b, s)) is an even function of 7. 

Proof of part (b). 

The random variables H and W are independent. It follows from ([2]) that the 
probability density function of H, evaluated at h, is 0(/i — 7). Thus 



e(7;s) 



1 r°^ r°° /\h\\ 
— — / / s(^-^)^{h-^)dhwfw{w)dw (B.4) 



where fw denotes the probability density function of W. Obviously, 

-1 r 

„_,,!_! E{W) io 



t 



tn-p,i-!i(t){h--i)dhw fw{w)dw. (B.5) 



Note that s{x) = tn~p,i-^ for all x > d. Subtracting fIB.SP from flB.4l) we therefore 
obtain 

"OO pdw 



e(r,s) = 1+ 



n— p,l- 



E{W) 



J-dw 



w 



n—p,l~ 



(j){h—'-f) dhw fw{w) dw. 



Changing the variable of integration in the inner integral from h to x = h/w, we 
obtain ©. 
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